Search Result

Select

Self-adaptive Web crawler code generation method based on webpage source code structure comprehension

Yao LIU, Ru LIU, Yu ZHAI

Journal of Computer Applications 2023, 43 (6): 1779-1784. DOI: 10.11772/j.issn.1001-9081.2022060929

Abstract （322）

HTML （20）

PDF （1224KB）（115）

Save

To address the problems of Web crawler code failure and high manual maintenance cost caused by webpage source code changes led by frequent webpage redesigns， especially changes in element structures or attribute identifiers of target entities such as article dates， main body of text or source organizations， a self-adaptive Web crawler code generation method based on webpage source code structure comprehension was proposed. Firstly， the corresponding Web crawler code was extracted by analyzing the change patterns of webpage structural characteristics. Secondly， the changes in the webpage source code and code were represented by the Encoder-Decoder model. By fusing the semantic features of the webpage source code structure， the features of webpage source code changes and the features of webpage code changes， an adaptive code generation model was obtained. Finally， the perception， generation and activation mechanisms of the adaptive system were improved to form a Web crawler system with adaptive processing capability. Compared with TF-IDF+Seq2Seq and TriDNR+Seq2Seq models， the proposed adaptive code generation model was experimentally verified to show the superiority in the representation of webpage source code changes and the effectiveness of code generation with a final accuracy of 78.5%. With the proposed method， the Web crawler code operation problems caused by the webpage source code changes could be solved， and a new idea for the adaptive processing capability of Web resource acquisition — Web crawler technique was provided.

Table and Figures | Reference | Related Articles | Metrics

Select

Algorithm path self-assembling model for business requirements

Yao LIU, Xin TONG, Yifeng CHEN

Journal of Computer Applications 2023, 43 (6): 1768-1778. DOI: 10.11772/j.issn.1001-9081.2022060944

Abstract （192）

HTML （5）

PDF （1992KB）（60）

Save

The algorithm platform， as the implementation way of automatic machine learning， has attracted the wide attention in recent years. However， the business processes of these platforms need to be built manually， and these platforms are faced with inflexible model calling and the incapability of customized automatic algorithm construction for specific business requirements. To address these problems， an algorithm path self-assembling model for business requirements was proposed. Firstly， the sequence features and structural features of code were modeled simultaneously based on Graph Convolutional Network （GCN） and word2vec representation. Secondly， functions in the algorithm set were further discovered through a clustering model， and the obtained function subsets were used for the preparation of the path discovery of algorithm components between subsets. Finally， based on the relationship discovery model and ranking model trained with prior knowledge， the self-assembled paths of candidate code components were mined， thus realizing the algorithm code self-assembling. Using the proposed evaluation indicators for comparison and analysis， the best result of the proposed algorithm path self-assembling model is 0.8， while that of the baseline model Okapi BM25+word2vec is 0.21. To a certain extent， the proposed model solves the problem of missing code structure and semantic information in traditional code representation methods and lays the foundation for the research of refinement of algorithm process self-assembling and automatic construction of algorithm pipelines.

Table and Figures | Reference | Related Articles | Metrics

Select

New scheme for image transmission based on SPIHT

FU Yao LIU Qing-li

Journal of Computer Applications 2012, 32 (04): 1144-1146. DOI: 10.3724/SP.J.1087.2012.01144

Abstract （849）

PDF （441KB）（343）

Save

In this paper, a new real-time image transmission scheme based on Set Partitioning In Hierarchical Tree (SPIHT) was proposed. Firstly, the image data needed to be transformed by wavelet. Secondly, in order to resist error pervasion when image was transmitted, the wavelet coefficients were separated into small blocks and encoded by SPIHT. Finally, in order to improve the quality of the restructured image, the wavelet coefficients of the highest level in every block were transmitted repeatedly. In order to improve the throughput of the image transmission system, the optimum frame length was proposed. Both theoretical demonstration and simulation results here have validated that the proposed scheme provides stronger error resilience than traditional scheme based on SPIHT, and can improve the peak signal to noise ratio of the restructured image about 10dB.

Reference | Related Articles | Metrics